Digitized  by  the  Internet  Archive 

in  2011  with  funding  from 

Boston  Library  Consortium  IVIember  Libraries 


http://www.archive.org/details/selfconfirmingeqOOfude 


working  paper 
department 
of  economics 


SELF -CONFIRMING  EQUILIBRIUM 

Drew  Fudenberg 
David  K.  Levine 


No.  581 


June  1991 


massachusetts 
institute  of 

technology 

50  memorial  drive 
Cambridge,  mass. 02139 


SELF -CONFIRMING  EQUILIBRIUM 

Drew  Fudenberg 
David  K.  Levine 


No.  581  June  1991 


JUL  ^81991 


SELF- CONFIRMING  EQUILIBRIUM 

by 

Drew  Fudenberg 

and 

David  K.  Levine* 


June  1991 


*Departments  of  Economics,  Massachusetts  Institute  of  Technology  and 
University  of  California,  Los  Angeles.   We  thank  David  Kreps  for  many  helpful 
conversations.   National  Science  Foundation  grants  89-07999-SES ,  90-23697-SES , 
88-08204-SES  and  90-9008770-SES ,  and  the  Guggenheim  Foundation  provided 
financial  support. 


Abstract 


Self-confirming  equilibrium  differs  from  Nash  equilibrium  in 
allowing  players  to  have  incorrect  beliefs  about  how  their  opponents 
would  play  off  of  the  equilibrium  path.   We  provide  several  examples 
of  ways  that  self -confirming  and  Nash  equilibria  differ.   In  games 
with  "identified  deviators , "  all  self -confirming  equilibrium 
outcomes  can  be  generated  by  extensive -form  correlated  equilibria. 
In  two-player  games,  self -confirming  equilibria  with  "unitary 
beliefs"  are  Nash. 


1.    Introduction 

Nash  equilibriiom  and  its  refinements  describe  a  situation  in  which  (i) 
each  player's  strategy  is  a  best  response  to  his  beliefs  about  the  the  play 
of  his  opponents,  and  (ii)  each  player's  beliefs  about  the  opponents'  play 
are  exactly  correct.   We  propose  a  new  equilibrium  concept,  self -confirming 
equilibrium,  that  weakens  condition  (ii)  by  requiring  only  that  players' 
beliefs  are  correct  along  the  equilibrium  path  of  play.   Thus,  each  player 
may  have  incorrect  beliefs  about  how  his  opponents  would  play  in 
contingencies  that  do  not  arise  when  play  follows  the  equilibrium,  and 
moreover  the  beliefs  of  different  players  may  be  wrong  in  different  ways. 

The  concept  of  self- confirming  equilibrium  is  motivated  by  the  idea 
that  non-cooperative  equilibria  should  be  interpreted  as  the  outcome  of  a 
learning  process ,  in  which  players  revise  their  beliefs  using  their 
observations  of  previous  play.   Suppose  that  each  time  the  game  is  played 
the  players  observe  the  actions  chosen  by  their  opponents,  (or  more 
generally,  the  terminal  node  of  the  extensive  form)  but  that  players  do  not 
observe  the  actions  the  opponents  would  have  played  at  the  information  sets 
that  were  not  reached  along  the  path  of  play.   Then,  if  a  self- confirming 
equilibriiim  occurs  repeatedly,  no  player  ever  observes  play  that  contradicts 
his  beliefs,  so  the  equilibrium  is  "self -confirming"  in  the  weak  sense  of 
not  being  inconsistent  with  the  evidence.   By  analogy  with  the  literature  on 
the  bandit  problem  (e.g.  Rothschild  [1974])  one  might  expect  that  a  non-Nash 
self -confirming  equilibrium  can  be  the  outcome  of  plausible  learning 
processes.   This  point  was  made  by  Fudenberg  and  Kreps  [1988],  who  gave  an 
example  of  a  learning  process  that  converges  to  non-Nash  outcome  unless  the 
players  engage  in  a  sufficient  amount  of  "experimentation"  with  actions  that 


do  not  maximize  the  current  period's  expected  payoff.   Our  notion  of 
self-confirming  equilibrium  was  developed  to  capture  the  implications  of 
learning  when  players  do  little  or  none  of  this  experimentation. 

To  illustrate  the  relationship  between  Nash  equilibrium  and  self- 
confirming  equilibrium,  note  first  that  in  a  one-shot  simultaneous -move 
game,  every  information  set  is  reached  along  every  path,  so  that  self- 
confirming  equilibrium  reduces  to  the  Nash  condition  that  beliefs  are 
correct  at  every  information  set.   Somewhat  less  obvious  is  the  fact  that 
self -confirming  equilibria  must  have  Nash  outcomes  in  any  two-player  game, 
so  long  as  each  player  has  "unitary  beliefs,"  meaning  that  each  strategy 
that  the  player  uses  with  positive  probability  is  a  best  response  to  the 
same  (possibly  incorrect)  beliefs  about  the  opponent's  off -path  play. 

Unitary  beliefs  seem  natural  if  we  think  of  equilibrium  as  correspon- 
ding to  the  outcome  of  a  learning  model  with  a  single  player  1  and  a  single 
player  2,  etc,  as  in  Fudenberg  and  Kreps .   We  were  led  to  consider  the 
alternative  of  heterogeneous  beliefs  —  each  strategy  a  player  uses  with 
positive  probability  may  be  a  best  response  to  a  different  belief  about  his 
opponents  —  by  our  [1990],  [1991]  study  of  learning  in  models  where  a  large 
number  of  individual  players  of  each  type  are  randomly  matched  with  one 
another  each  period.   In  such  models,  heterogeneous  beliefs  can  arise 
because  different  individuals  have  different  learning  experiences  or 
different  prior  beliefs.   When  heterogeneous  beliefs  are  allowed,  the 
self -confirming  equilibria  of  two-player  game  need  not  be  Nash  equilibria  of 
the  original  game,  but  rather  are  Nash  equilibria  of  an  extended  version  of 
the  game  in  which  players  can  observe  the  outcome  of  certain  correlating 
devices.   Moreover,  the  self-confirming  equilibrium  outcomes  are  a  subset 
of  the  outcomes  of  Forges's  [1986]  extensive -form  correlated  equilibria. 


This  inclusion  does  not  obtain  in  general  n-player  games,  as  shown  by 
the  example  of  Fudenberg  and  Kreps.   However,  it  does  obtain  in  any  game 
which  has  "identified  deviators , "  meaning  that  deviations  by  different 
players  cannot  lead  to  the  same  information  set.   In  these  games,  moreover, 
every  outcome  of  a  self -confirming  equilibrium  with  unitary  beliefs  is  Nash, 
provided  that  each  player's  subjective  uncertainty  corresponds  to  indepen- 
dent randomizations  by  his  opponents.   (This  independence  condition  is 
difficult  to  explain  informally;  it  is  discussed  at  length  on  page  7.) 

Since  self-confirming  equilibrium  requires  beliefs  be  correct  along  the 
equilibrium  path  of  play,  it  is  inherently  an  extensive -form  solution 
concept,  in  contrast  to  Nash  equilibrium,  which  can  be  defined  on  the 
strategic  form  of  the  game.   Indeed,  two  extensive  form  games  with  the  same 
strategic  form  can  have  different  sets  of  self -confirming  equilibria.   This 
conclusion  runs  counter  to  the  argument,  recently  popularized  by  Kohlberg 
and  Mertens  [1986] ,  that  the  strategic  form  encodes  all  strategically 
relevant  information,  and  two  extensive  forms  with  the  same  strategic  form 
will  be  played  in  the  same  way.   However,  dependence  on  the  extensive  form 
is  natural  when  equilibrium  is  interpreted  as  the  result  of  learning,  as  the 
strategic  form  does  not  pin  down  how  much  of  the  opponents'  strategies  each 
player  will  observe  when  the  game  is  played.   In  our  view,  the  contrast 
between  our  approach  and  that  of  Kohlberg  and  Mertens  shows  that  it  is 
better  to  specify  the  process  that  leads  to  equilibrium  play  before 
deciding  which  games  are  equivalent  or  which  equilibria  are  most  reasonable. 

The  idea  of  self-confirming  equilibrium  with  unitary  beliefs  is 
implicit  in  the  work  of  Fudenberg  and  Kreps;  our  contribution  here  is  to 
give  a  formal  definition  of  the  concept  and  explore  its  properties  in 
various  classes  of  games.   We  first  noted  the  way  that  heterogeneous  beliefs 


could  allow  for  a  form  of  extensive -form  correlation  in  our  [1990]  paper  on 
steady- state  learning.   Battagalli  and  Guatoli's  [1988]  conjectural 
equilibrium,  and  Rubinstein  and  Wolinsky's  [1990]  rationalizable 
conjectural  equilibrium  are  also  motivated  by  the  idea  that  equilibrium 
corresponds  to  the  steady  state  of  a  learning  model.   Their  work  differs 
from  ours  in  considering  a  more  general  formulation  of  the  information 
players  observe  about  one  another's  strategies  when  the  game  is  played  and 
in  restricting  attention  to  unitary  beliefs.   Kalai  and  Lehrer's  [1991] 
concept  of  a  private-beliefs  equilibrium  assumes  -nat  beliefs  are  both 
independent  and  unitary;  they  extend  our  observation  that  such  equilibria 
have  Nash  outcomes  in  multi-stage  games  by  allowing  for  beliefs  that  are 
only  approximately  correct  on  the  path  of  play.   Canning's  [1990]  social 
equilibrium  is  also  motivated  by  the  idea  that  equilibrium  corresponds  to  a 
steady  state  of  a  learning  system;  his  concept  differs  in  being  defined  in 
terms  of  the  learning  process,  instead  of  being  a  reduced- form  notion 
defined  on  the  original  game. 

Although  the  motivation  for  self -confirming  equilibrium  is  the  idea 
that  equilibrium  is  the  result  of  learning,  this  paper  discusses  only  the 
equilibrium  concept  and  not  its  learning- theoretic  foundations.   Our  [1991] 
paper  considers  the  steady  states  of  a  system  in  which  a  fixed  stage  game  is 
played  repeatedly  by  a  large  population  of  players  who  are  randomly  matched 
with  one  another,  and  learn  about  their  opponents'  strategies  by  observing 
play  in  their  own  matches.   Individual  players  remain  in  the  population  a 
finite  number  of  periods;  new  players  enter  to  keep  the  total  population 
size  constant.   Entering  players  believe  that  they  face  a  steady-state 
distribution  on  the  opponents'  play,  and  update  their  exogenous  priors  over 


the  true  steady  state  using  Bayes  rule.   Given  their  beliefs,  players  choose 
their  strategies  in  each  period  to  maximize  their  expected  present  value ;  in 
particular  any  "experiments"  must  optimal  even  though  they  may  have 
short-run  costs. 

Our  [1991]  paper  shows  that  if  lifetimes  are  long,  then  steady  states 
approximate  those  of  self -confirming  equilibria.   If  in  addition  players  are 
very  patient,  they  do  enough  experimentation  that  they  learn  the  relevant 
aspects  of  off -path  play,  and  steady  states  approximate  Nash  equilibria. 

2.    The  Stage  Game 

The  stage  game  is  an  I-player  extensive -form  game  of  perfect 

recall.   The  game  tree  X,  with  nodes  x  G  X,  is  finite.   The  terminal  nodes 

are  z  e  Z  c  X.   Information  sets,  denoted  by  h  G  H,  are  disjoint  subsets  of 

X\Z.   The  information  sets  where  player  i  has  the  move  are  H.  c  H,  and  H  .  = 

H\H.  are  information  sets  for  other  players.   The  feasible  actions  at 

information  set  h.  G  H  are  denoted  A(h.);  A.  =  u,   „  A(h.)  is  the  set  of 
1  11    h.GH.    1 

1   1 

all  feasible  actions  for  player  i,  and  A  .=  u.  .  A;  are  the  feasible  actions 
to  player  i's  opponents. 

A  pure  strategy  for  the  player  i,  s.,  is  a  map  from  information  sets  in 
H.  to  actions  satisfying  s.(h.)  G  A(h.);  S.  is  the  set  of  all  such 
strategies.   We  let  s  G  X  =  x.  ^  S,  denote  a  strategy  profile  for  all 
players,  and  s  .  G  S  .  ■=  x.  .  S. .   Each  player  i  receives  a  payoff  in  the 
stage  game  that  depends  on  the  terminal  node.   Player  i's  payoff  function  is 
denoted  u. :  Z  ->  R;  each  player  knows  his  own  payoff  function.   Let  A(«) 
denote  the  space  of  probability  distributions  over  a  set.   Then  a  mixed 
strategy  profile  is  a  G  x.  .  A(S.). 


Let  Z(s.)  be  the  subset  of  terminal  nodes  that  are  reachable  when  s   is 
1  i 

played.   Let  H(s.)  be  the  set  of  all  information  sets  that  can  be  reached  if 
s.  is  played. 

We  will  also  need  to  refer  to  the  information  sets  that  are  reached 
with  positive  probability  under  o,    denoted  H(a) .   Notice  that  if  a  .  is 
completely  mixed,  then  H(s.,ct  .)  =  H(s  ) ,  as  every  information  set  that  is 
potentially  reachable  given  s.  has  positive  probability. 

In  addition  to  mixed  strategies,  we  define  behavior  strategies.   A 
behavior  strategy  for  player  i,  jr.,  is  a  map  from  information  sets  in  H.  to 
probability  distributions  over  moves:  7r.(h.)  e  A(A(h.)),  and  11.  is  the  set 
of  all  such  strategies.   As  with  pure  strategies,  tt  e  n  =  x.    11.,  and 
TT  .  e  n  .  =  X.  .  n.  .   Let  pCzItt)  be  the  probability  that  z  is  reached  under 
profile  E;  define  p(x|7r)  analogously.   (Note  that  the  probability  p  will 
reflect  the  probability  distribution  on  nature's  moves.) 

Since  the  game  has  perfect  recall,  each  mixed  strategy  o.    induces  an 
unique  equivalent  behavior  strategy  denoted  7r.(«|c7.).   In  other  words, 
7r.(h.|a.)  is  the  probability  distribution  over  actions  at  h.  induced  by  a.. 


We  will  suppose  that  all  players  know  the  structure  of  the  extensive 
form,  and  so  in  particular  know  the  strategy  spaces  of  their  opponents.   We 
have  already  assumed  players  know  their  own  payoff  function  and  the 
probability  distribution  on  nature's  moves,  so  the  only  uncertainty  each 
player  faces  concerns  the  strategies  his  opponents  will  play.   To  model  this 
"strategic  uncertainty,"  we  let  /i.  be  a  probability  measure  over  11  .,  the 
set  of  other  players'  behavior  strategies.   Fix  s..   Then  the  marginal 
probability  of  a  terminal  node  z  is 
(2.1)  p^(z|s^,/i^)  "  I  p^(z|s^,7r_^)^^(d7r_^)  . 


This  in  turn  gives  rise  to  preferences 

(2.2)         u.(s.,^.)  =  S  „,   -p.(z|s.  ,/i.)u.(z)  . 

1 

It  is  important  to  note  that  even  though  the  beliefs  /i.  are  over 
opponents'  behavior  strategies,  and  thus  reflect  player  i's  knowledge  that 
his  opponents  choose  their  randomizations  independently,  the  marginal 
distribution  p(«|s.,/i.)  over  terminal  nodes  can  involve  correlation  between 
the  opponents'  play.   For  example,  if  players  2  and  3  simultaneously  choose 
between  U  and  D,  player  1  might  assign  probability  1/4  to  7r-(U)  -  jr_(U)  =  1, 
and  probability  3/4  to  7r„(U)  -=  »ro(U)  -=  1/2.   Even  though  both  profiles  in 
the  support  of  /i,  suppose  independent  randomization  by  players  2  and  3,  the 
marginal  distribution  on  their  joint  actions  is  p(U,U)  =  7/16  and  p(U,D)  = 
p(D,U)  =  p(P,D)  =  3/16,  which  is  a  correlated  distribution.   This  correla- 
tion reflects  a  situation  where  player  1  believes  some  unobserved  common 
factor  has  helped  determine  the  play  of  both  of  his  opponents.   If,  as  we 
have  supposed,  the  opponents  are  in  fact  randomizing  independently,  we 
should  expect  player  1  to  learn  this  if  he  obtains  sufficiently  many 
observations.   However,  if  few  or  no  observations  are  accumulated,  the 
correlation  in  the  predicted  marginal  distribution  can  persist. 


Self- Confirming  Equilibrium  and  Consistent  Self -Confirming  Equilibrium. 


One  way  to  define  a  Nash  equilibrium  is  as  a  mixed  profile  o   such  that 

for  each  s.  e  support (ct.)  there  exists  beliefs  /j.  such  that 

s.   maximizes  u.  (•,!*.),  and 
1  1    1 

/i.({rr  .  |ff.(h.)  =  »r.(h.  la.)))  =1   for  all  h.  e  H  .  . 
In  other  words,  each  player  optimizes  given  his  beliefs,  and  his  beliefs  are 
a  point  mass  on  the  true  distribution. 


One  of  the  goals  of  this  paper  is  to  introduce  the  notion  of  a 
self-confirming  equilibrium,  which  weakens  Nash  equilibrium  by  relaxing  the 
second  requirement  above.   Instead  of  requiring  that  beliefs  are  correct  at 
each  information  set,  self -confirming  equilibrium  requires  only  that,  for 
each  s.  that  is  played  with  positive  probability,  beliefs  are  confirmed  by 
the  information  revealed  when  s.  and  a        are  played,  which  we  take  to  be 
corresponding  distribution  on  terminal  nodes  p(s  ,a  .)•   This  corresponds  to 
the  idea  that  the  terminal  node  reached  is  observed  at  the  end  of  each  play 
of  the  game. 

The  idea  that  beliefs  need  only  be  correct  along  the  path  is  a  natural 
consequence  of  a  Bayesian  approach  to  the  formation  of  forecasts  we  study 
below:  Bayesian  learning  should  not  be  expected  to  lead  to  correct  beliefs 
about  play  at  information  sets  that  are  never  reached. 


Definition  1:   Profile  a   is  self -confirming  if  for  each  s.  G  support(a.) 

there  exists  beliefs  a.    such  that 

1 

(i)  s.    maximizes   u.(»,u.),    and 

1  1  1 

(ii)        u.       {tt    .|7r.(h)    =  7r.(h|cr.))      ■=  1        for  all   s  t^   i   and  h.    e  H(s.,a    .) 
'      '        '^1    [      -i'    j'    ^  j^    I    j'   J  J  ^    i'    -i' 


Condition  (ii)  requires  that  player  i's  beliefs  be  concentrated  on  the 
subset  of  n  that  coincide  with  the  true  distribution  at  information  sets 
that  are  reached  with  positive  probability  when  player  i  plays  s..   His 
beliefs  about  play  at  other  information  sets  need  not  be  concentrated  on  a 
single  behavior  strategy,  and  at  these  information  sets  his  beliefs  can 
incorporate  correlation  of  the  kind  discussed  in  the  last  section.   We 
emphasize  that  each  s.  e  support (a.)  may  be  confirmed  by  a  different  belief 


/i. .   In  the  definition  of  Nash  equilibrium,  this  flexibility  is  vacuous,  as 

each  /i.  must  be  exactly  correct;  the  flexibility  matters  once  beliefs  are 

allowed  to  be  wrong.   This  diversity  of  beliefs  is  natural  in  a  learning 

model  with  populations  of  each  type  of  player:  different  player  i's  may  have 

different  beliefs,  either  due  to  different  priors  or  to  different 

observations . 

If  the  same  beliefs  u.  can  be  used  to  rationalize  each 

1 

s.  e  support(a. ),  we  will  say  that  the  equilibrium  has  unitary  beliefs. 
This  restriction  corresponds  to  learning  models  with  a  single  player  of  each 
type.   We  will  occasionally  speak  of  heterogeneous  beliefs  when  we  want  to 
emphasize  that  beliefs  need  not  be  unitary.   A  self -confirming  equilibrium 
is  independent  if  for  all  sets  11.  c  11.  ,  u.(x.  .11.)  =  11.  .u.(E.)    so  that 
learning  player  j's  behavior  strategy  would  not  change  i's  beliefs  about 

Definition  2 :   Profile  a  is  a  consistent  self -confirming  equilibrium  if 
for  each  s.  e  support (cr.)  there  are  beliefs  n.    such  that 


(1)       s.  maximizes  u.(«,/i.)i  and 

(ii')     M.f{7r  .1  ;r.(h.)  =  7r.(h.|CT.))|  =  1  for  all  j  ^^  i  and  h.  e  H(s.) 


In  words,  self -confirming  equilibrium  requires  that  for  each  s.  that 
player  i  gives  positive  probability,  player  i  correctly  forecasts  play  at 
all  information  sets  that  will  be  reached  with  positive  probability  under 
profile  (s.,a    .).   Consistent  self -confirming  equilibrium  requires  further 


that  player  i's  beliefs  be  correct  at  all  information  sets  that  could 
possibly  be  reached  when  he  plays  s.  under  some  play  of  the  opponents.   This 
stronger  requirement  captures  the  information  player  i  would  obtain  in  a 
learning  model  if  his  opponents  play  each  of  their  strategies  sufficiently 
often.   We  call  this  a  "consistent"  equilibrium  because  in  this  case  if  both 
player  i  and  player  j  can  unilaterally  deviate  and  cause  information  set  h 
to  be  reached,  then  both  players'  beliefs  about  play  at  h  are  correct,  and 
in  particular  are  equal  to  each  other. 

Note  that  in  a  one -shot  simultaneous -move  game,  all  information  sets 
are  on  the  path  of  every  profile,  so  the  sets  H(s.,a  .)  are  all  of  H,  and 
condition  (ii)  requires  that  beliefs  be  exactly  correct.   Hence  in  these 
games,  all  self -confirming  equilibria  are  Nash.   In  more  general  games,  the 
self -confirming  equilibria  can  be  a  larger  set,  as  shown  by  the  examples  of 
the  next  section. 

4.    The  Characterization  of  Self -Confirming  Equilibria 

This  section  examines  the  properties  of  self -confirming  equilibria.   We 
begin  with  an  example  of  a  self -confirming  equilibrium  that  is  not 
consistent  self -confirming.   The  example  has  the  property  that  one  player 
cannot  distinguish  between  deviations  by  two  of  his  opponents;  we  show  that 
in  the  opposite  case  of  "identified  deviators"  any  self-confirming 
equilibrium  is  consistent  self  confirming.   We  then  provide  several  examples 
of  ways  in  which  consistent  self -confirming  equilibria  can  fail  to  be  Nash, 
and  show  that  all  consistent  self -confirming  equilibria  have  outcomes  that 
can  be  supported  by  the  extensive -form  correlated  equilibria  defined  by 
Forges  [1986].   Finally,  we  show  that  consistent  self -confirming  equilibria 


10 


with  independent,  unitary  beliefs  have  the  same  outcomes  as  Nash  equilibria. 

Example  1.  [Fudenberg-Kreps] :   In  the  three  player  game  illustrated  in  Figure 
1,  player  1  moves  first.   If  he  plays  A,  player  2  moves  next;  if  he  plays  D, 
player  3  gets  the  move.   If  player  2  gets  the  move,  he  can  either  play  A, 
which  ends  the  game,  or  play  D,  which  gives  the  move  to  player  3.  The  key 
feature  of  the  game  is  that  if  player  3  gets  the  move,  he  cannot  tell 
whether  player  1  played  D,  or  player  1  played  A  and  player  2  played  D. 

Fudenberg  and  Kreps  [1988]  use  this  game  to  show  that  learning  need  not 
lead  to  Nash  equilibrium  even  if  players  are  long-lived.   Suppose  that 
player  1  expects  player  3  to  play  R  and  player  2  expects  player  3  to  play  L. 
Given  these  beliefs,  it  is  optimal  for  players  1  and  2  to  play  A^  and  A„ . 
Moreover,  (A^ ,A„)  is  a  self -confirming  equilibrium.   However,  it  is  not  a 
Nash  equilibrium  outcome:  Nash  equilibriiom  requires  players  1  and  2  to  have 
the  same  (correct)  beliefs  about  player  3's  play,  and  if  both  have  the  same 
beliefs,  at  least  one  of  the  players  must  choose  D.   (If  the  beliefs  assign 
probability  more  than  1/3  to  L  and  2  plays  A,  then  1  plays  D,  while  if  the 
beliefs  assign  probability  more  than  1/3  to  R  and  1  plays  A  then  2  plays  D.) 

When  this  example  has  been  presented  in  seminars,  the  following 
question  has  frequently  been  raised:   Shouldn't  player  2  revise  his  beliefs 
about  player  3  in  the  direction  of  3  playing  R  when  he  sees  player  1  play  A? 
And,  in  the  spirit  of  the  literature  on  the  impossibility  of  players 
"agreeing  to  disagree"  (Aumann  [1976],  Geanakoplos  and  Polemarchakis 
[1982],  and  so  forth)  shouldn't  players  1  and  2  end  up  with  the  same  beliefs 
about  player  3's  strategy? 

Our  response  is  to  note  that,  while  this  sort  of  indirect  learning 
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could  occur  in  our  model,  it  need  not  do.   First,  the  indirect  learning 
supposes  that  players  know  (or  have  strong  beliefs  about)  one  another's 
payoffs,  which  is  consistent  with  our  model  but  is  not  necessarily  the  case. 
Second,  even  if  player  2  knows  player  I's  payoffs,  and  hence  is  able  to 
infer  that  player  1  believes  player  3  will  play  R,  it  is  not  clear  that  this 
will  lead  player  2  to  revise  his  own  beliefs.   It  is  true  that  player  2  will 
revise  his  beliefs  if  he  views  the  discrepancy  between  his  own  beliefs  and 
player  I's  as  due  to  information  that  player  1  has  received  but  player  2  has 
not,  but  player  2  might  also  believe  that  player  1  has  no  objective  reason 
for  his  beliefs,  but  has  simply  made  a  mistake.   The  "agreeing  to  disagree" 
literature  ensures  all  differences  in  beliefs  are  attributable  to  objective 
information  by  supposing  that  the  players'  beliefs  are  consistent  with 
Bayesian  updating  from  a  common  prior  distribution.   But  when  equilibrium  is 
interpreted  as  the  result  of  learning,  the  assumption  of  a  common  prior  is 
inappropriate.   Indeed,  the  question  of  whether  learning  leads  to  Nash 
equilibrium  can  be  rephrased  as  the  question  of  whether  learning  leads  to 
conunon  posterior  beliefs  starting  from  arbitrary  priors.   (To  emphasize  this 
point,  recall  that  assuming  players  have  a  common  prior  distribution  over 
one  another's  strategies  is  equivalent  to  assuming  that  the  beliefs 
correspond  to  a  correlated  equilibrium  (Aumann  [1987]),  and  assuming  an 
independent  common  prior  is  equivalent  to  Nash  equilibrium.  (Brandenburger 
and  Dekel  [1987]). 

While  (A..  ,A„)  in  Example  1  is  a  self -confirming  equilibrium  (with 
unitary  beliefs)  it  is  not  a  consistent  self-confirming  equilibrium,  as 
players  1  and  2  have  different  beliefs  about  player  3's  play  yet  player  3's 
information  set  h_  belongs  to  both  H(A..)  and  H(A-).   The  reason  that  this 
inconsistency  matters  is  that  both  player  1  and  player  2  can  cause  h„  to  be 
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reached  by  deviating  from  the  equilibrium  path.   Thus  the  game  does  not  have 
"observed  deviators"  in  the  sense  of  the  following  definition. 

Definition  3:   A  game  has  observed  deviators  if  for  all  players  i,  all 

strategy  profiles  s  and  all  deviations  s'  ^  s.,   h  €   H(s'  ,  s  .)\H(s)  implies 

that  there  is  no  s'  .  with  h  e  H(s.,s'  .)• 

-1  1   -1 

In  words,  this  definition  says  that  if  some  deviation  from  s  by  player 
i  leads  to  a  new  information  set  h,  then  the  information  set  can  only  be 
reached  if  player  i  deviates.   Games  of  perfect  information  satisfy  this 
condition,  as  do  repeated  games  with  observed  actions.   More  generally,  the 
condition  is  satisfied  by  all  "multi-stage  games  with  observed  actions," 
meaning  that  the  extensive  form  can  be  parsed  into  "stages"  with  the 
properties  that  the  beginning  of  each  stage  corresponds  to  a  proper  subgame 

(Selten  [1975]),  and  that  within  each  stage  all  players  move  simultane- 

2 
ously.     The  following  result  shows  that  the  condition  is  also  satisfied  in 

all  two-player  games  of  interest: 

Lemma  1^:   Every  two-player  game  of  perfect  recall  has  observed  deviators. 

Proof:   Suppose  to  the  contrary  that  there  exists  a  profile  s  =  (s^ ,  s„), 
and  information  set  h  such  that  h  €   H(s..  ,s„),  but  h  e  H(s^  ,  s')  for  some  s' 
and  h  G  H(s'  ,  s-)  for  some  s'    If  h  e  H..  then  player  1  cannot 
distinguish  between  s^  and  s'  ,  while  h  e  H„  implies  that  player  2  cannot 
distinguish  between  s„  and  s' .  ■ 

Theorem  1:   In  games  with  observed  deviators,  self -confirming  equilibria  are 
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consistent  self -confirming. 

Proof:   The  idea  of  the  proof  is  that  a  player's  beliefs  about  play  at 
information  sets  he  cannot  cause  to  be  reached  do  not  influence  his  play. 
Since  the  game  has  observed  deviators ,  only  one  player  i(h)  can  cause  a 
given  information  set  to  be  reached.   So,  starting  from  a  self -confirming 
equilibrium,  we  construct  new  beliefs  in  which  all  players  have  the  same 

A 

beliefs  about  play  at  h  as  player  i(h)  did  in  the  original  equilibrium. 

To  make  this  precise,  suppose  that  tr  is  a  self -confirming  equilibrium. 

Then  for  each  player  i  and  each  s.  e  support (cr.)  there  is  a  /i.  that 

satisfies  (i)  and  (ii)  of  definition  3.1.   We  will  define  new  beliefs  ^x'. 

that  coincide  with  /x.  except  on  H(s.)\  H(s.,  a    .),  and  assign  probability  1 

to  the  true  play  7r(h|a  .)  at  information  sets  in  H(s.)\H(s.,  a    .) .      To  do 

this,  let  Q  =  H(s.)\  H(s.,  a    .),  let  II  .  be  the  projection  of  II  .  onto  Q, 

and  let  u.  be  the  marginal  distribution  u.  induces  on  II  .  .   Let  II  .  be  the 
1  °  1  -1        -1 

-  P  P 

proiection  of  H  .  onto  H(s.,a  .),  and  let  a.    be  the  distribution  on  II  .  that 
t^  -J  -1         1-1  1  -1 

assigns  probability  1  to  7r.(h.|a.)  at  each  h.  e  H(s.,  a    .).   Finally,  set 

Q    P 
p'.  =  /i  ?  X  /i .  . 
1    1    1 

Since  u.  satisfies  condition  (ii)  of  definition  1,  u'.    satisfies  the 
1  1 

stronger  condition  (ii' )  .   Condition  (ii)  also  implies  that 
H(s.,/j.)  =  H(s.,a  .),  that  is,  player  i  correctly  predicts  the  equilibrium 
path  of  play  when  he  plays  s.,  and  it  is  then  clear  from  the  definition  of 
H'.    that  H(s.,  /i'. )  -  H(s.,  n   )    as  well.    Moreover,  information  sets  in 
H(s.)/  H(s.,  n'.)    are  not  reached  under  (s  ,  a»' )  .  but  can  be  reached  if  some 
player  j  deviates.   Since  the  game  has  observed  deviators,  these  information 
sets  cannot  be  reached  if  none  of  player  i's  opponents  deviate,  that  is,  the 
information  sets  are  not  in  H(s'.  ,  n'.)    for  any  s'.  .   Hence  player  i's  expected 
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payoff  to  every  action  is  the  same  under  /x.  and  /i'.  ,  so  s.  is  a  best  response 


to  A*'.  , 
1 


Corollary:   Self -confirming  equilibria  are  consistent  self -confirming  in  all 
two-player  games  of  perfect  recall. 

Even  consistent  self -confirming  equilibria  need  not  be  Nash.   There  are 
two  reasons  for  this  difference.   First,  consistent  self- confirming 
equilibrium  allows  a  player's  uncertainty  about  his  opponents'  strategies  to 
be  correlated,  while  Nash  equilibrium  requires  that  the  beliefs  be  a  point 
mass  on  a  behavior  strategy  profile. 

Example  2  [Untested  Correlation]:   In  the  game  in  figure  2,  player  1  can 
play  A,  which  ends  the  game,  or  play  L^  ,  M..  ,  or  R^  ,  all  of  which  lead  to  a 
simultaneous -move  game  between  players  2  and  3,  neither  of  whom  observes 
player  I's  action.   In  this  game,  A  is  a  best  response  to  the  correlated 
di,stribution  p(L„,  L-)  =  p(R«,  R^)  =  1/2.   Thus  if  player  I's  prior  beliefs 
are  either  that  2  and  3  always  play  L,  or  that  they  always  play  R,  then 
player  I's  best  response  is  to  play  A,  and  so  A  is  the  outcome  of  a 
self- confirming  equilibrium. 

However,  we  claim  that  A  is  not  a  best  response  to  any  strategy  profile 
for  players  2  and  3.   Verifying  this  is  straightforward  but  tedious:   Let  p„ 
and  p_  be  the  probabilities  that  players  2  and  3,  respectively,  assign  to  L„ 
and  L_ .   In  order  for  A  to  be  a  best  response,  the  following  3  inequalities 
must  be  satisfied: 

(4.1)       4[p2P3-(l-P2)(l-P3)]  ^  1.  or  P2  +  P3  ^  V^. 
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(4.2)       ^t-P2P3  +(1-P2)<1-P3)1  ^  !■  °^  P2  "^  P3  -  ^/^'  ^"""^ 


(4.3)       P2(^-P3>  +  (1-P2>P3  ^  V3. 

We  will  show  that  when  constraints  (4.1)  and  (4.2)  are  satisfied,  (4.3) 
cannot  be.   For  any  p-  <  1/2,  the  left-hand  side  of  (4.3)  is  minimized  when 

p_  is  as  small  as  possible,  that  is,  for  Po(p^)  "  3/4  -  p„ .   The  minimized 

2 
value  is  2p.  -  3/2  p„  +  3/4,  and  this  expression  is  minimized  over  p   at 

p„  =  p.  =  3/8.   At  this  point  the  left-hand  side  of  (4.3)  equals 

15/36  >  1/3.   The  case  p.  >  1/2  is  symmetric. 

We  stress  that  the  correlation  in  this  example  need  not  describe  a 
situation  in  which  player  1  believes  that  players  2  and  3  actually  correlate 
their  play.   To  the  contrary,  player  1  might  be  certain  that  they  do  not  do 
so,  and  that  she  could  learn  which  (uncorrelated)  strategy  profile  they  are 
using  by  giving  them  the  move  a  single  time.   These  competing  explanations 
for  the  correlation  —  call  them  "objective"  correlation  and  "subjective" 
correlation  —  cannot  be  distinguished  in  a  static,  reduced- form  model  of  the 
kind  considered  in  this  paper.   However,  our  [1991]  paper  on  steady- state 
learning  shows  that  the  non-Nash  outcome  of  example  2  can  be  the  steady 
state  of  a  learning  process  where  players  are  certain  that  their  opponents' 
actual  play  is  an  uncorrelated  behavior  profile. 

In  addition  to  untested  correlation,  there  is  another  way  that 
consistent  self -confirming  equilibria  can  fail  to  be  Nash,  which  arises 
because  the   self -confirming  concept  allows  each  s.  that  player  i  assigns 
positive  probability  to  be  a  best  response  to  different  beliefs.   This 
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possibility  allows  for  non-Nash  play  even  in  two-player  games.   The  most 
immediate  consequence  of  these  differing  beliefs  is  a  form  of 
convexification,  as  in  the  following  example. 

Example  3  [Public  Randomization]:   In  the  game  in  Figure  3,  player  1  can  end 
the  game  by  moving  L  or  he  can  give  player  2  the  move  by  choosing  R.   Player 
1  should  play  L  if  he  believes  2  will  play  D,  and  should  play  R  if  he 
believes  2  will  play  U.   If  player  1  plays  R  with  positive  probability, 
player  2's  unique  best  response  is  to  play  U,  so  there  are  two  Nash 
equilibrixom  outcomes,  (L)  and  (R,U) .   The  mixed  profile  ((1/2  L,  1/2  R) ,  U) 
is  a  self -confirming  equilibrium  whose  outcome  is  a  convex  combination  of 
the  Nash  outcomes:   Player  1  plays  L  when  he  expects  player  2  to  play  D,  and 
R  when  he  expects  2  to  play  U,  and  when  he  plays  L  his  forecast  of  D  is  not 
disconfirmed.   (Moreover,  this  equilibrium  is  clearly  independent.) 

The  next  example  shows  that  self-confirming  equilibria  in  two  player 
games  can  involve  more  than  convexification  over  Nash  equilibria.   The  idea 
is  that  by  embedding  a  randomization  over  equilibria  as  in  Example  3  in  the 
second  stage  of  a  two-stage  game,  we  can  induce  one  player  to  randomize  in 
the  first  stage  even  though  such  randomization  cannot  arise  in  Nash 
equilibrium.   Moreover,  this  randomization  may  in  turn  cause  the  player's 
opponent  to  take  an  action  that  would  not  be  a  best  response  without  it. 

Example  A:   The  extensive  form  shown  in  Figure  3  corresponds  to  a  two -stage 
game:   In  the  first  stage,  players  1  and  2  play  simultaneously,  with 
player  1  choosing  U  or  D  and  player  2  choosing  L,M,  or  R.   Before  the  second 
stage,  these  choices  are  revealed.   In  the  second  stage,  only  player  2  has  a 
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move .choosing  between  R  ("Reward")  costing  both  players  0,  and  P  ("Punish") 
costing  both  players  10.   The  payoffs  are  additively  separable  between 
periods . 

We  claim  first  that  in  any  Nash  equilibrium  of  this  game,  player  1  must 
play  a  pure  strategy  and  player  2  must  play  M  in  stage  1  with  probability 
zero.   Let  q(U)  be  the  conditional  probability  that  player  2  plays  R  given 
that  player  1  played  U,  and  let  q(D)  be  the  probability  of  R  conditional  on 
D.   Notice  that  player  I's  payoffs  depend  only  on  whether  2  chooses  R  or  P. 
If  player  1  mixes  between  U  and  D,  he  must  have  the  same  expected  payoff 
from  each,  so  in  order  for  player  1  to  randomize,  it  must  be  that  3  +  lOq(U) 
=  2  +  lOq(D),  or  q(D)  -  q(U)  =  1/10.   But  if  both  U  and  D  have  positive 
probability,  then  maximization  by  player  2  implies  that  he  plays  R,  so  q(U) 
=  q(D)  =  1,  a  contradiction.   We  conclude  that  player  1  must  play  a  pure 
strategy,  and  consequently  player  2  cannot  play  M. 

Next,  we  consider  correlated  equilibrium,  that  is,  a  probability 
distribution  over  strategies  with  the  property  that  for  each  player  i  and 
each  s.  with  positive  probability,  playing  s.  is  a  best  response  to  the 
distribution  of  s  .  conditional  on  s..   If  1  plays  U  with  probability  1,  2 
must  play  L,  while  if  he  plays  D  with  probability  1,  2  must  play  R.   So  in 
this  case  the  probability  of  M  is  zero.   On  the  other  hand,  if  both  U  and  D 
have  positive  probability  and  player  2  plays  M  with  probability  1,  then 
player  1  correctly  anticipates  that  player  2  will  respond  to  both  U  and  D 
with  M.   In  order  for  player  1  to  play  U  in  the  first  period,  he  must  expect 
to  be  punished  with  positive  probability.   In  other  words,  the  outcome 
(U,M,P)  must  have  positive  probability.   But  this  is  impossible.   If  (U,M) 
has  positive  probability,  player  2  cannot  follow  (U,M)  with  a  positive 
probability  of  P.   Thus,  player  2  cannot  play  M  with  probability  one,  and 
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the  probability  of  M  is  bounded  away  from  one  in  all  correlated  equilibria 
because  the  set  of  correlated  equilibria  is  closed. 

However,  player  2  can  play  M  with  probability  1  in  a  self -confirming 
equilibrium:   Let  player  I's  strategy  be  a^    =  (1/2  U,  1/2  D) ,  and  let  player 
2's  strategy  ct_  be  "play  M  in  the  first  stage  and  play  R  in  the  second  stage 
regardless  of  the  first-period  outcome."  Player  2's  strategy  is  a  best 
response  to  the  strategy  a^    that  player  1  is  actually  playing,  and 
U  e  support(£7^)  is  a  best  response  to  a„ .   The  strategy  D  e  support(cr^)  is 
not  a  best  response  to  a„ ,  but  it  is  a  best  response  to  the  belief  that 
player  2  will  play  R  if  player  1  plays  D  and  P  if  player  1  plays  U;  and  when 
player  1  plays  D  his  forecast  of  what  would  have  happened  if  he  had  played  U 
is  not  disconfirmed. 

Although  consistent  self -confirming  equilibria  need  not  be  Nash 
equilibria  or  even  correlated  equilibria,  they  are  a  special  case  of  another 
equilibrium  concept  from  the  literature,  namely  the  extensive -form 
correlated  equilibria  defined  by  Forges  [1986].   These  equilibria,  which  are 
only  defined  for  games  whose  information  sets  are  ordered  by  precedence 
(the  usual  case) ,  are  the  Nash  equilibria  of  an  expanded  game  where  an 
"autonomous  signalling  device"  is  added  at  every  information  set,  with  the 
joint  distribution  over  signals  independent  of  the  actual  play  of  the  game 
and  common  knowledge  to  the  players ,  and  the  player  on  move  at  each 

information  set  h  is  told  the  outcome  of  the  corresponding  device  before  he 

3 
chooses  his  move.    Extensive -form  correlated  equilibrium  includes  Aumann's 

[1974]  correlated  equilibrium  as  the  special  case  where  the  signals  at 

information  sets  after  stage  1  have  one -point  distributions  and  so  contain 

no  new  information.   The  possibility  of  signals  at  later  dates  allows  the 
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construction  of  extensive -form  correlated  equilibria  that  are  not  correlated 
equilibria,  as  in  Myerson  [1986].   Another  example  is  based  on  the  self- 
confirming  equilibrium  we  constructed  in  Example  4. 

Example  4  revisited:   We  construct  an  extensive -form  correlated  equilibrium 
with  the  same  distribution  over  outcomes  as  the  self-confirming  equilibrium 
in  Example  4.   The  first- stage  private  signals  describe  play  in  that  stage: 
There  is  a  probability  1/2  of  the  signals  (U.M)  and  (D,M)  in  stage  1.   The 
strategies  in  stage  1  are  to  play  the  recommended  action.   The  second- stage 
public  signal  takes  on  two  values,  U  and  D.   The  strategy  for  player  2  in 
stage  2  is  to  play  P  if  player  1  played  U  and  the  second  signal  is  D,  and  R 
otherwise.   The  second-stage  public  signal  is  perfectly  correlated  with 
player  I's  first-stage  private  signal.   Let  us  check  that  it  is  a  Nash 
equilibriiam  for  the  players  to  use  the  strategies  their  signals  recommend: 
Since  player  I's  signal  reveals  whether  or  not  he  will  be  punished  for 
playing  U,  player  1  finds  it  optimal  to  obey  his  signal.   Player  2's  first 
signal  is  uninformative  about  player  I's  stage  1  play,  and  so  player  2 
expects  player  1  to  randomize  1/2-1/2  in  the  first  stage  and  thus  plays  M. 
Player  2  cannot  improve  on  the  recommended  strategies  in  the  second  stage 
because  he  is  only  told  to  punish  U  when  player  I's  first  signal  was  to  play 
D,  and  if  player  1  obeys  his  signal  this  will  not  occur.   The  role  of  the 
second  signal  is  to  tell  player  2  when  to  punish  player  1  without  revealing 
player  I's  play  at  the  beginning  of  the  first  stage;  if  player  I's  play  was 
revealed  at  this  point  this  would  remove  player  2's  incentive  to  play  M. 
Note  that  while  the  extensive -form  correlated  equilibrium  and  the  self- 
confirming  equilibrium  have  the  same  distribution  over  outcomes,  they 
involve  different  distributions  over  strategies:  In  a  self -confirming 
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equilibrium,  if  player  1  mixes  between  U  and  D,  then  player  2  must  respond 
to  both  U  and  D  with  R;  player  1  sometimes  plays  D  because  he  incorrectly 
believes  2  will  respond  to  U  with  P.   In  an  extensive -form  correlated 
equilibrium,  each  player's  predictions  about  his  opponents'  strategies  are 
on  average  correct,  so  if  player  1  sometimes  believes  that  player  2  responds 
to  U  with  P  then  player  2  must  assign  positive  probability  to  a  strategy 
that  does  so. 

Theorem  2: 

For  each  consistent  self -confirming  equilibrium  of  a  game  whose 
information  sets  are  ordered  by  precedence,  there  is  an  equivalent 
extensive -form  correlated  equilibrium,  that  is,  one  with  the  same  distribution 
over  terminal  nodes. 


Proof:   Let  a   be  consistent  self- confirming,  and  for  each  s.  e  support  a., 

let  ;i.(s.)  be  beliefs  satisfying  conditions  (i)  and  (ii)  of  definition  1. 

We  now  expand  the  game  by  adding  an  initial  randomizing  device  whose 

realization  is  partially  revealed  as  private  information  at  various 

information  sets.   A  realization  of  this  device  is  an  I -vector  with  the  i 

component  a  pair  (s .  .■n    .)  with  s.  e  S.  and  jt  .  =  (tt.).  .  ell  ..   The  s. 
'^  *^      1   -1        11      -1     j  J^^i    -1        1 

follow  the  probability  distribution  a   (and  in  particular  s.  and  s.  are 
independent  for  i'^j )  .   The  distribution  of  it   .    conditional  on  s  is  /i.(s.). 
Intuitively,  profile  tt  .  is  the  way  player  i  expects  to  be  "punished"  if  he 
deviates  from  strategy  s. . 

Initially  each  player  i  is  told  s..   Subsequent  revelations  also  depend 
upon  s.   At  information  sets  on  the  path  of  s,  he  H(s),  no  additional 
information  is  revealed.   At  information  sets  that  can  be  reached  only  if 
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two  or  more  players  deviate  from  s  no  information  is  revealed.   If 

h.  e  H(s'.  ,s  .)\H(s),  so  h.  is  reached  by  player  i's  deviation,  and  J5^i,  then 

player  j  is  told  7r.(h). 

In  a  consistent  self-confirming  equilibrium,  if  h.  e  H(s'.  ,s  .)\H(s)  and 

h.  e  H(s,'  ,s  ,  )\H(s),  then  h.  e  H(s.)  n  H(s,  ).   It  follows  that  7r^(h.)  = 
J      k   -k  ^  J      1       k  J   J 

»r.(h.)  for  j  1^   i,k,  so  only  one  distinct  signal  is  received  by  j  . 

Now  consider  the  strategy  profile  s  for  the  expanded  game  in  which  each 

player  j  plays  s.  except  at  information  sets  (in  the  expanded  game)  where 

the  signal  7r.(h.)  is  received.   At  such  information  sets  j  plays  according 

to  7r^(h.). 
J   J 

By  construction,  s  induces  the  same  distribution  over  terminal  nodes  as 
a   does.   If  player  i's  opponents  follow  s,  player  i  will  never  receive  an 
additional  message,  so  player  i  is  willing  to  play  tt.  (h.)  at  the 
probability-zero  information  sets  where  player  j  deviates  and  i  is  told 
7r-:(h,).   Moreover,  given  the  initial  message  s.,  opponents'  play  is  drawn 
from  /i.(s.),  and  s.  is  a  best  response  to  /i.(s.)  from  condition  (i)  in  the 
definition  of  self-confirming  equilibrium.   Hence  s  is  a  Nash  equilibrium  of 
the  expanded  game .  * 


Corollary:   In  games  with  identified  deviators ,  every  self -confirming 
equilibrium  outcome  is  the  outcome  of  an  extensive  form  correlated 
equilibrium. 

Remark:   Note  that  not  all  outcomes  of  extensive -form  correlated  equilibria 
are  the  outcomes  of  consistent  self -confirming  equilibria.   In  particular, 
because  self- confirming  equilibria  supposes  that  players  choose  their 
actions  independently,  the  equilibrium  path  of  play  must  be  uncorrelated,  so 
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not  even  every  correlated  equilibrium  outcome  can  be  attained.   This 
suggests  that  it  might  be  possible  to  find  an  interesting  and  tighter 
characterization  of  consistent  self- confirming  equilibria;  we  have  not  been 
able  to  do  so. 

So  far  we  have  seen  three  ways  in  which  self- confirming  equilibria  can 
fail  to  be  Nash.   First,  two  players  can  have  inconsistent  beliefs  about  the 
play  of  a  third,  as  in  example  1.   Second,  a  player's  subjective  uncertainty 
about  his  opponents'  play  may  induce  a  correlated  distribution  on  their 
actions,  even  though  he  knows  that  their  actual  play  is  uncorrelated;  this 
was  the  case  in  example  2.   Finally,  the  fact  that  each  player  can  have 
heterogeneous  beliefs  —  that  is,  different  beliefs  may  rationalize  each 
s.  e  support(a.)  -  introduces  a  kind  of  extensive -form  correlation.   Theorem 
1  showed  that  in  games  with  identified  deviators ,  self -confirming  equilibria 
are  consistent,  thus  precluding  the  kind  of  non-Nash  situation  in  example  1. 
The  next  theorem  shows  that  the  combination  of  off-path  correlation  and 
heterogeneous  beliefs  encompass  all  other  ways  that  self -confirming 
equilibria  can  fail  to  be  Nash. 

Theorem  3 :   Every  consistent  self- confirming  equilibrium  with  independent, 
unitary  beliefs  is  equivalent  to  a  Nash  equilibrium. 

Proof:   Fix  a  consistent  self -confirming  equilibrium  a   with  independent, 
unitary  beliefs.   Thus  for  each  player  i,  there  is  a  /x.  such  that  conditions 
(i)  and  (ii' )  of  definition  1  are  satisfied  for  all   s.  e  support(/j. )  ,  and 
/I,  is  a  product  measure  on  II  .  . 

We  will  construct  a  new  strategy  profile  a'    by  constructing  its 
equivalent  behavior  strategy  profile  tt'  .   The  idea  is  simply  to  change  the 
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play  of  all  players  j  7^  i  to  that  given  by  player  i's  beliefs  at  all  the 
information  sets  that  can  be  reached  if  i  unilaterally  deviates  from  a.      The 
unitary  beliefs  condition  implies  that  "player  i's  beliefs"  are  a  single 
object;  the  requirement  that  the  equilibrium  is  consistent  ensures  that  this 
process  is  well-defined,  as  if  deviations  by  two  distinct  players  can  lead 
to  the  same  information  set,  then  their  beliefs  at  that  information  set  are 
identical.   Finally,  the  condition  of  independence  says  that  player  i's 
beliefs  n.    correspond  to  the  behavior  strategy  profile  tt'  .  . 
To  explicitly  define  r'    requires  some  notation. 


H^  =  [    U   H(s'  a    .)l\H(a) 
'-  j^i,s'.    ^  -^  -• 


Let  H  =  I    U  H(s'  ,  a    .)|\H(a)  be  the  set  of  information  sets  that  can  be 

J 
reached  if  exactly  one  player  j  5^  i  deviates  from  a,    and  let 

A  . 

H  =  H  \(H  u  H(a))  be  the  information  sets  that  can  only  be  reached  if 

player  i  or  at  least  two  other  players  deviate.   Let  H  =  U  H  be  all  of  the 

i 

information  sets  that  can  be  reached  if  exactly  one  player  deviates. 

A  . 

For  all  players  i,  let  7r.'(h.)  =  'n.ChAo.)    at  all  h.  6  H.  \  H"' ,  and  let 

7r'.(h.)  =  7r.(h.|u.)  at  all  h.  such  that  for  some  player  i;^  i  and  some 

s'.  ,  h.  e  H(s'.  ,    o    .)  . 
1   J      1    -1 

To  verify  that  this  construction  is  well-defined,  we  note  first  that  if 

A  . 

h.  e  H.  n  H-^  then  there  must  be  some  player  i  t*  i  and  some  s'.  ,  such  that 

h.  e  H(s'  a    .).   Thus  the  algorithm  above  specifies  at  least  one  value  for 
J      1    -1  ^  '^ 

tt'.  at  each  h.  .   Next  we  check  that  it  assigns  only  one  value  to  tt'.  at  each 

h..    If  there  two  players  i  and  k  and  strategies  s'   s'  such  that  h.  e 
J  i-  ./  ^     i'   k  J 

H(s'.  ,  a    .)  and  h.  €  H(s,'  ,  a  ,),  then  h.  e  H(cr.)  n  H(a,  )  .   Because  the 
1-1       J      k   -k         J      1       k 

equilibriiam  is  consistent  and  unitary,  7r.(h.|u.)  -  jr.(h.|u,  )  —  jr.(h.)  - 

tt'.  (h.),  so  tt'.  is  well  defined.  Finally  we  check  that  profile  tt'  is  a  Nash 
J   J        J 

equilibrium.   This  verification  has  two  steps.   First,  we  claim  that  the 
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original  behavior  strategy  n.    is  a  best  response  to  the  transformed  profile 

n' .  .      Note  first  that  for  all  j  ^   i,  tt'.  (h.)  =  »r.(h.|/j.)  at  every  information 
-1  J   J     J   j'  1         ■' 

set  that  can  be  reached  if  all  players  but  i  follow  profile  a;    this  implies 
that  »r'.  (h.)  =  jr.(h.|u.)  at  every  information  set  that  can  be  reached  if  all 
players  but  i  follow  n'  .   Next  recall  that  player  i's  expected  payoff  to  any 
action  is  unaffected  by  changes  in  his  beliefs  at  information  sets  that 
cannot  be  reached  if  no  other  player  deviates,  and  finally  use  the  assump- 
tion that  of  independent  beliefs  to  conclude  that  player  i's  payoff  to  each 
strategy  s'.  under  beliefs  n.    can  be  computed  using  the  product  of  the 
corresponding  marginal  distributions  7r.(h.|/x.)-   It  then  follows  that  player 
i's  expected  payoff  to  each  s'.  is  the  same  when  he  knows  the  opponents' 
strategies  are  tt'  .  as  when  his  beliefs  are  ^.  ,  so  that  n.    is  a  best  response 
to  tt'  .  .   To  complete  the  proof,  we  note  that  n.    and  tt'.  differ  only  at 
information  sets  that  cannot  be  reached  unless  some  player  j  ^  i  deviates 

from  n .  ,    so  that  n'.    is  a  best  response  to  tt'  .  .  ■ 

J  1  -1 

Corollary:   In  two-player  games,   every  self -confirming  equilibrium  with 

4 
unitary  beliefs  is  Nash. 


5 .    Generalizations  and  Extensions 

Self-confirming  equilibrium  describes  a  situation  in  which  players  know 
their  own  payoff  functions,  the  distribution  over  nature's  moves,  and  the 
strategy  spaces  of  their  opponents;  the  only  uncertainty  players  have  is 
about  which  strategies  their  opponents  will  play.   Moreover,  the  assumption 
that  player's  beliefs  are  correct  along  the  path  of  play  implicitly  supposes 
that  players  observe  the  terminal  node  of  the  game  at  the  end  of  each  play. 
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These  informational  assumptions  are  what  underlie  our  results  relating 
self -confirming  equilibrium  to  standard  solution  concepts,  but  in  some 
cases,  these  informational  assumptions  are  too  strong.   Thus  it  is  of  some 
interest  to  consider  how  the  assumptions  might  be  relaxed. 

Battagali  and  Guatoli  [1988]  and  Rubinstein  and  Wolinsky  [1990]  replace 
our  assumption  that  players  observe  the  terminal  node  of  the  game  with  a 
more  general  formulation  of  what  the  players  observe  when  the  game  is 
played.   In  our  view  these  observations  should  not  be  more  informative  than 
the  terminal  node  of  the  game,  and  should  at  least  allow  each  player  to 
compute  his  own  payoff.   (In  these  models,  where  each  player  i  observes 
signal  g.(s)  when  the  profile  s  is  played,  this  constraint  would  require 
player  i's  utility  function  u.  to  be  measurable  with  respect  to  g..)   It 
would  be  interesting  to  see  a  characterization  of  self -confirming 
equilibrium  for  the  case  in  which  each  player's  end-of -stage  information  is 
precisely  his  own  payoff;  the  key  would  be  finding  a  tractable  description 
of  how  much  infoirmation  the  payoffs  convey.   Another  interesting  case  is 
that  of  games  of  incomplete  information,  with  the  assumption  that  each 
player  observes  the  entire  sequence  of  play  and  his  own  type,  but  not  the 
types  of  his  opponents.  We  conjecture  that  if  each  player's  payoff  function 
does  not  depend  on  his  opponents'  types,  the  set  of  self -confirming 
equilibria  is  the  same  whether  or  not  the  opponents'  types  are  observed  at 
the  end  of  each  round. 

The  other  informational  ass\imptions  of  self -confirming  equilibrium  can 
be  relaxed  as  well.  It  is  easy  to  generalize  self -confirming  equilibrium  to 
allow  for  players  to  not  know  the  distribution  of  nature's  moves;  see  our 
[1990]  working  paper  for  the  details.   Allowing  for  the  possibility  that 
players  do  not  know  the  extensive -form  structure  of  the  game  is  more 
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difficult.   One  issue  is  that,  when  the  extensive  form  is  unknown,  players 
may  believe  that  some  opponents  can  condition  their  play  information  the 
opponents  cannot  in  fact  possess.   Also,  in  some  formulations  of  the 
players'  inference  processes,  players  may  become  convinced  that  their 
opponents'  play  is  influenced  by  the  actions  the  player  means  to  take  at 
information  sets  that  are  in  fact  not  reached.   Fudenberg  and  Kreps  [1991] 
discuss  some  of  these  problems,  but  are  unable  to  provide  a  satisfactory 
resolution  of  them. 
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Footnotes 

We  thank  Robert  Aumann  for  convincing  us  of  the  importance  of  this  kind  of 
subjective  correlation. 

'  ■  r  ■ 

2 
See  Fudenberg  and  Tirole  [1991]  for  a  more  detailed  explanation  of 

multi-stage  games;  we  introduced  the  definition  in  Fudenberg  and  Levine 

[1983] .   Note  that  the  extensive  form  in  example  2  below  is  not  a 

multi-stage  game  with  observed  actions,  but  is  a  game  with  observed 

deviators .   Moreover,  splitting  player  I's  information  into  two  consecutive 

choices,  the  first  one  being  A  or  -A,  yields  a  multi-stage  game  with 

observed  actions  that  has  the  same  reduced  normal  form  and  the  same  set  of 

self -confirming  equilibria.   This  emphasizes  that  from  the  viewpoint  of  self 

confirming  equilibria,  identified  deviators  is  the  more  fundamental 

property. 

3 
Forges  shows,  in  the  spirit  of  the  revelation  principle,  that  it  suffices 

to  work  with  a  smaller  set  of  signalling  devices.   She  also  defines 

"communications  equilibria,"  which  allow  the  players  to  send  messages  in  the 

course  of  play  that  influence  subsequent  signals. 


4 
This  is  proved  directly  in  Fudenberg  and  Kreps  [1991] 
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